Spatial analysis is a process that begins with exploring and mapping a dataset and can lead to potentially complex models and visualizations of real world features and phenomena. Spatial queries are the building blocks of this analytical process. These queries are software operations that allow us to ask questions of our data and which return data metrics, subsets or new data objects. In this lesson we explore the two basic types of spatial queries: measurement queries and relationship queries.
Instructor Notes
The basic types of spatial queries are:
Both measurement and relationship queries operate on the geometry of features in one or in two datasets and are dependent on the type of geometry. For example, with point features you can make distance measurements or ask what points are spatially inside polygon objects. But it would not make sense to compute the area of a point. Polygon features, on the other hand, allow for a wider range of both measurement and spatial relationship queries.
There are important distinctions between these two types of queries. - Measurement queries always depend on the CRS of the data while spatial relationship queries almost always do not. - Measurement queries return a continuous value (e.g. area) while relationship queries evaluate to true or false, and then return the features for which the relationship is true.
We already know how to do attribute queries with our data. For example, we can select one or more specific counties by name or select those counties where the total population is greater than 100,000 because we have these columns in the dataset.
Spatial queries are special because they are dynamic. For example, we can compute area from the geometry without it already being encoded or we can select BART stations in Berkeley even if city is not encoded in the BART data by linking those two spatial datasets in the same geographic space. This dynamic query capability is extremely powerful!
In this lesson we’ll work through examples of each of those types of queries.
Load the libraries we will use.
library(sf)
library(tmap)
library(here)
Read in the CA Counties data and then take a look at its geometry and attributes.
# Read in the counties shapefile
counties = st_read(dsn = here("notebook_data",
"california_counties",
"CaliforniaCounties.shp"))
## Reading layer `CaliforniaCounties' from data source
## `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/california_counties/CaliforniaCounties.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022
## Projected CRS: NAD83 / California Albers
counties <- st_make_valid(counties)
plot(counties$geometry)
Take a look at the spatial dataframe.
head(counties,2)
## Simple feature collection with 2 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -28402.76 ymin: -357924.6 xmax: 216465.7 ymax: -169685.2
## Projected CRS: NAD83 / California Albers
## NAME STATE_NAME POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE BLACK AMERI_ES
## 1 Kern California 839631 102.9 851089 104.2829 499766 48921 12676
## 2 Kings California 152982 109.9 155039 111.4274 83027 11014 2562
## ASIAN HAWN_PI HISPANIC OTHER MULT_RACE MALES FEMALES MED_AGE AVE_HH_SZ
## 1 34846 1252 413033 204314 37856 433108 406523 30.7 3.15
## 2 5620 271 77866 42996 7492 86344 66638 31.1 3.19
## AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC CountyFIPS
## 1 3.61 284367 29757 152828 101782 06103
## 2 3.59 43867 2634 22329 18904 06089
## geometry
## 1 MULTIPOLYGON (((213672.6 -2...
## 2 MULTIPOLYGON (((12524.03 -1...
Alameda County and save it to a spatial dataframealameda = counties[counties$NAME=='Alameda',]
plot(alameda$geometry)
We’ll start off with some simple measurement queries.
We can get the area of Alameda County with thesf function st_area.
st_area(alameda)
## 1927093261 [m^2]
Okay! We got the area of the county in square meters.
sfuses theunitspackage to manage (get and set) units.
It’s more useful to return the area of large regions in square KM (or sq miles) and we can do that conversion manually by dividing by 1,000,000 (1000 * 1000):
st_area(alameda) / 1000000
## 1927.093 [m^2]
BUT the units package doesn’t respond well to manual unit conversions and still reports the value as m2.
So let’s try that conversion with units.
units::set_units(st_area(alameda), km^2)
## 1927.093 [km^2]
km^2 to?units::set_units(st_area(alameda), km^2) ## WHAT SHOULD YOU CHANGE IT TO?
## 1927.093 [km^2]
It’s a good idea to check one or two measurements before you automate your workflow to make sure you are getting valid values. If we look up the area of Alameda county on wikipedia we get 739 sq mi (1,910 km2). Are the values returned by st_area valid? Why might they differ?
We can also use st_area to add the area of all counties to the spatial dataframe.
counties$areakm2 <- units::set_units(st_area(counties), km^2)
# take a look
head(counties)
## Simple feature collection with 6 features and 25 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## Projected CRS: NAD83 / California Albers
## NAME STATE_NAME POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE BLACK
## 1 Kern California 839631 102.9 851089 104.282870 499766 48921
## 2 Kings California 152982 109.9 155039 111.427421 83027 11014
## 3 Lake California 64665 48.6 65253 49.082334 52033 1232
## 4 Lassen California 34895 7.4 35039 7.422856 25532 2834
## 5 Los Angeles California 9818605 2402.3 9904341 2423.264150 4936599 856874
## 6 Madera California 150865 70.1 153025 71.065672 94456 5629
## AMERI_ES ASIAN HAWN_PI HISPANIC OTHER MULT_RACE MALES FEMALES MED_AGE
## 1 12676 34846 1252 413033 204314 37856 433108 406523 30.7
## 2 2562 5620 271 77866 42996 7492 86344 66638 31.1
## 3 2049 724 108 11088 5455 3064 32469 32196 45.0
## 4 1234 356 165 6117 3562 1212 22416 12479 37.0
## 5 72828 1346865 26094 4687889 2140632 438713 4839654 4978951 34.8
## 6 4136 2802 162 80992 37380 6300 72682 78183 33.1
## AVE_HH_SZ AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC CountyFIPS
## 1 3.15 3.61 284367 29757 152828 101782 06103
## 2 3.19 3.59 43867 2634 22329 18904 06089
## 3 2.39 2.94 35492 8944 17472 9076 06106
## 4 2.50 2.98 12710 2652 6590 3468 06086
## 5 2.98 3.58 3445076 203872 1544749 1696455 06073
## 6 3.28 3.63 49140 5823 27726 15591 06102
## geometry areakm2
## 1 MULTIPOLYGON (((213672.6 -2... 21137.940 [km^2]
## 2 MULTIPOLYGON (((12524.03 -1... 3603.706 [km^2]
## 3 MULTIPOLYGON (((-235734.3 1... 3443.291 [km^2]
## 4 MULTIPOLYGON (((12.28914 35... 12225.762 [km^2]
## 5 MULTIPOLYGON (((173874.5 -4... 10585.866 [km^2]
## 6 MULTIPOLYGON (((16681.16 -1... 5577.010 [km^2]
Spatial measurements can differ greatly depending on the CRS. Let’s take a look.
# Calculate area using data in WGS84 CRS (4326)
counties$areakm2_wgs84 <- units::set_units(st_area(st_transform(counties,4326)), km^2)
# Calculate area using data in UTM NAD83 zone 10 CRS (26910)
counties$areakm2_utm <- units::set_units(st_area(st_transform(counties,26910)), km^2)
# Calculate area using data in Web Mercator CRS (3857)
counties$areakm2_web <- units::set_units(st_area(st_transform(counties, 3857)), km^2)
# Take a look at a subset of columns only
head(counties[,c('NAME','areakm2','areakm2_wgs84','areakm2_utm','areakm2_web')])
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## Projected CRS: NAD83 / California Albers
## NAME areakm2 areakm2_wgs84 areakm2_utm
## 1 Kern 21137.940 [km^2] 21137.848 [km^2] 21201.508 [km^2]
## 2 Kings 3603.706 [km^2] 3603.104 [km^2] 3608.153 [km^2]
## 3 Lake 3443.291 [km^2] 3440.360 [km^2] 3440.595 [km^2]
## 4 Lassen 12225.762 [km^2] 12210.926 [km^2] 12228.764 [km^2]
## 5 Los Angeles 10585.866 [km^2] 10588.192 [km^2] 10628.068 [km^2]
## 6 Madera 5577.010 [km^2] 5574.650 [km^2] 5584.023 [km^2]
## areakm2_web geometry
## 1 31840.623 [km^2] MULTIPOLYGON (((213672.6 -2...
## 2 5528.030 [km^2] MULTIPOLYGON (((12524.03 -1...
## 3 5725.400 [km^2] MULTIPOLYGON (((-235734.3 1...
## 4 21276.793 [km^2] MULTIPOLYGON (((12.28914 35...
## 5 15559.365 [km^2] MULTIPOLYGON (((173874.5 -4...
## 6 8810.596 [km^2] MULTIPOLYGON (((16681.16 -1...
Let’s discuss the output.
CA Albers: The CRS of our source data is an equal area projected CRS that is optimized for area measurements in CA. So these values in the area_km2 column are highly accurate if the underlying geometry is.
WGS84: When we computed the area based on the data transformed to WGS84 we got almost identical values. WGS84 is a geographic (unprojected) CRS with angular units expressed as decimal degrees. So why is the output accurate? Well, new versions of the sf package use spherical (rather than Euclidean) geometry calculations on geographic data to quickly and accurately compute area and distance. Pretty awesome!
UTM10: This is a CRS optimized for Northern CA areas. So calculations outside the zone, eg SoCal, will be increasingly distorted as the area moves farther from the center of UTM zone 10N.
Web Mercator: This is a CRS that preserves shape and distorts area - so not accurate at all for area calculations.
Check out the help documentation for ?st_area for more information. But the important takeaway is that you need to use a CRS that is appropriate for your analysis/mapping needs!
When creating a spatial analysis work flow it is common to start by transforming all of your data to the same, appropriate CRS.
st_lengthWe can use the st_length operator in the same way to calculate the length features in a spatial dataframe. Always take note of the output units!
# Load BART Lines
bart_lines <- st_read(here('notebook_data', 'transportation', 'bart_lines_2019.geojson'))
## Reading layer `bart_lines_2019' from data source
## `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/transportation/bart_lines_2019.geojson'
## using driver `GeoJSON'
## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -122.4711 ymin: 37.50197 xmax: -121.7726 ymax: 38.02384
## Geodetic CRS: WGS 84
head(bart_lines)
## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -122.4711 ymin: 37.50197 xmax: -121.7726 ymax: 38.02384
## Geodetic CRS: WGS 84
## objectid rec_id operator mode_ class meters Shape__Len
## 1 1385 3 BART Heavy Rail Rapid 127862.570 1.0567986
## 2 1386 3 BART Heavy Rail Rapid 83001.619 0.6562471
## 3 1387 3 BART Heavy Rail Rapid 88328.491 0.7153645
## 4 1388 3 BART Heavy Rail Rapid 74594.947 0.5934920
## 5 1389 3 BART Heavy Rail Rapid 79278.237 0.6695474
## 6 1390 3 BART Heavy Rail Rapid 6672.096 0.0503038
## geometry
## 1 MULTILINESTRING ((-122.3926...
## 2 MULTILINESTRING ((-121.9393...
## 3 MULTILINESTRING ((-121.9393...
## 4 MULTILINESTRING ((-122.3534...
## 5 MULTILINESTRING ((-121.9004...
## 6 MULTILINESTRING ((-122.2124...
bart_lines$len_mi <- units::set_units(st_length(bart_lines), mi)
bart_lines$len_km <- units::set_units(st_length(bart_lines), km)
head(bart_lines)
## Simple feature collection with 6 features and 9 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -122.4711 ymin: 37.50197 xmax: -121.7726 ymax: 38.02384
## Geodetic CRS: WGS 84
## objectid rec_id operator mode_ class meters Shape__Len
## 1 1385 3 BART Heavy Rail Rapid 127862.570 1.0567986
## 2 1386 3 BART Heavy Rail Rapid 83001.619 0.6562471
## 3 1387 3 BART Heavy Rail Rapid 88328.491 0.7153645
## 4 1388 3 BART Heavy Rail Rapid 74594.947 0.5934920
## 5 1389 3 BART Heavy Rail Rapid 79278.237 0.6695474
## 6 1390 3 BART Heavy Rail Rapid 6672.096 0.0503038
## geometry len_mi len_km
## 1 MULTILINESTRING ((-122.3926... 62.628715 [mi] 100.791146 [km]
## 2 MULTILINESTRING ((-121.9393... 40.747035 [mi] 65.575997 [km]
## 3 MULTILINESTRING ((-121.9393... 43.377845 [mi] 69.809875 [km]
## 4 MULTILINESTRING ((-122.3534... 36.593701 [mi] 58.891853 [km]
## 5 MULTILINESTRING ((-121.9004... 38.910937 [mi] 62.621082 [km]
## 6 MULTILINESTRING ((-122.2124... 3.275202 [mi] 5.270927 [km]
The st_distance function can be used to find the distance between two geometries or two sets of geometries.
First let’s read in the bart station data.
bart_stations <- st_read(here('notebook_data', 'transportation','bart_stations_2019.geojson'))
## Reading layer `bart_stations_2019' from data source
## `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/transportation/bart_stations_2019.geojson'
## using driver `GeoJSON'
## Simple feature collection with 48 features and 11 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.4691 ymin: 37.50217 xmax: -121.7799 ymax: 38.01891
## Geodetic CRS: WGS 84
head(bart_stations)
## Simple feature collection with 6 features and 11 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.4475 ymin: 37.72158 xmax: -122.2686 ymax: 37.8528
## Geodetic CRS: WGS 84
## objectid cpt_mode stoptype ts_locatio at_street
## 1 1299 T TS BART - 12th St. Oakland City Center <NA>
## 2 1300 T TS BART - 16th St. Mission <NA>
## 3 1301 T TS BART - 19th St. Oakland <NA>
## 4 1302 T TS BART - 24th St. Mission <NA>
## 5 1303 T TS BART - Ashby <NA>
## 6 1304 T TS BART - Balboa Park <NA>
## on_street sp_citynam agencyname routename station_na
## 1 <NA> <NA> BART <NA> 12th St. Oakland City Center
## 2 <NA> <NA> BART <NA> 16th St. Mission
## 3 <NA> <NA> BART <NA> 19th St. Oakland
## 4 <NA> <NA> BART <NA> 24th St. Mission
## 5 <NA> <NA> BART <NA> Ashby
## 6 <NA> <NA> BART <NA> Balboa Park
## mode geometry
## 1 Rapid Rail POINT (-122.2715 37.80377)
## 2 Rapid Rail POINT (-122.4197 37.76506)
## 3 Rapid Rail POINT (-122.2686 37.80835)
## 4 Rapid Rail POINT (-122.4181 37.75247)
## 5 Rapid Rail POINT (-122.2701 37.8528)
## 6 Rapid Rail POINT (-122.4475 37.72158)
Compute the distance between two BART stations…
st_distance(bart_stations[bart_stations$station_na=='Ashby',],
bart_stations[bart_stations$station_na=='Downtown Berkeley',])
## Units: [m]
## [,1]
## [1,] 1931.225
You can also use it to find the distance between multiple features.
st_distance(bart_stations[bart_stations$station_na=='Downtown Berkeley',], bart_stations)
## Units: [m]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 7381.988 17710.64 6866.866 18567.61 1931.225 22828.39 22928.5 26154.46
## [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
## [1,] 16291.06 14381.92 26989.53 23925.42 25383.09 0 7457.117 37442.66
## [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] 14208.26 43198.64 11275.03 21073.81 27384.11 12853.06 8129.942 4564.345
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
## [1,] 31764.18 14725 1410.575 25979.78 18122.17 7459.668 19711.16 32801.25
## [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40]
## [1,] 4513.825 15540.83 10519.6 3184.935 18980.86 28933.05 30298.1 32109.34
## [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## [1,] 27619.56 38105.04 50110.36 18039.68 35369.22 7631.367 45024.17 37016.97
Spatial relationship queries consider how two geometries or sets of geometries relate to one another in space. For example, you may want to know what schools are located within the City of Berkeley or what East Bay Regional Parks have land within Berkeley. You may also want to combine a measurement query with a spatial relationship query. Example, you may want to know the total length of freeways within the city of Berkeley.
Here is a list of some of the more commonly used sf spatial relationship operations.
These can be used to select features in one dataset based on their spatial relationship to another. In other works, you can use these operations to make spatial selections or create spatial subsets.
Enough talk. Let’s work through some examples.
First, load the CA Places data and select the city of Berkeley and save it to a sf dataframe.
places = st_read(here("notebook_data",
"census",
"Places",
"cb_2019_06_place_500k.shp"))
## Reading layer `cb_2019_06_place_500k' from data source
## `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/census/Places/cb_2019_06_place_500k.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1521 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.2694 ymin: 32.53416 xmax: -114.229 ymax: 41.99314
## Geodetic CRS: NAD83
berkeley = places[places$NAME=='Berkeley',]
plot(berkeley$geometry)
Then, load the Alameda County schools data and make it a spatial dataframe.
schools_df = read.csv(here("notebook_data",
"alco_schools.csv"))
schools_sf = st_as_sf(schools_df,
coords = c('X','Y'),
crs = 4326)
Check that the two sf dataframes have the same CRS.
st_crs(schools_sf) == st_crs(berkeley)
## [1] FALSE
They don’t have the same CRS so we need to align them. Let’s transform (or reproject) the CRS of both of these dataframes to UTM10N, NAD83 (EPSG 26910). This is a commonly used CRS for Northern CA data.
# Transform data CRSes
schools_utm10 <- st_transform(schools_sf, 26910)
berkeley_utm10 <- st_transform(berkeley, 26910)
If you look at the Schools data you will see that it has a City column. So we can subset the data by attribute to select the Schools in Berkeley. No need to do a spatial selection. But let’s do it anyway to demonstrate the process.
Assume that the schools data do not have that city column. How can we identify the schools in Berkeley spatially?
Here’s how!
# SPATIALLY select only the schools within Berkeley
berkeley_school = schools_utm10[berkeley_utm10, , op=st_intersects] # NO QUOTES!!!
Yes that was it! Take a long look at that simple yet powerful spatial selection syntax.
You should interpret that syntax as:
Select the features (i.e. rows) in the schools_utm10 dataframe
whose geometry spatially intersects the Berkeley_utm10 geometry
and keep all of the schools_utm10 columns
The op=st_intersects argument is optional because st_intersects is the default spatial selector.
To emphasize this, let’s rerun the last command.
# SPATIALLY select only the schools within Berkeley
berkeley_schools = schools_utm10[berkeley_utm10, ]
spatially intersects mean?Here’s one way to explain it.
Geometry A spatially intersects Geometry B if any of its parts (e.g., a point, line segment, or polygon) is equivalent to, touches, crosses, is contained by, contains, or overlaps any part of Geometry B.
So st_intersects is the most general of all spatial relationships! It is also the most useful. However, you can specify any of those more specific spatial relationships by setting op= to any of the options listed in the ?st_intersects? help documentation.
Let’s check out the sf object that our selection returned.
# How many schools did we get
dim(berkeley_schools)
## [1] 32 8
# Map the results
plot(berkeley_utm10$geometry)
plot(berkeley_schools$geometry, col="red", add = T)
IMPORTANT: The default spatial selection operator is
st_intersects. If you want to use any other spatial operator - and it is rare that you need to - it must be specified.
For example, we can use the st_disjoint operator to select only those schools NOT in Berkeley.
# Select all Alameda County Schools NOT in Berkeley with the disjoint operator
berkeley_schools_disjoint = schools_utm10[berkeley_utm10, , op = st_disjoint]
# Plot the result
plot(berkeley_schools_disjoint$geometry)
plot(berkeley_utm10,
col = NA,
border = "red",
add = T)
## Warning in plot.sf(berkeley_utm10, col = NA, border = "red", add = T): ignoring
## all but the first attribute
There is no need to memorize these spatial operators (aka predicates)! Here is a fantastic sf cheatsheet that lists and briefly explains all these common functions (and many more).
Let’s load a new dataset, the CA Protected Areas Database (CPAD), to demonstrate these spatial queries in more detail.
This dataset contains all of the protected areas (parks and the like) in California.
We will use this data and the Alameda County census tract data that we created earlier to ask, “What protected areas are within Alameda County?”
First load the CPAD data.
cpad <- st_read(here("notebook_data",
"protected_areas",
"CPAD_2020a_Units.shp"))
## Reading layer `CPAD_2020a_Units' from data source
## `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/protected_areas/CPAD_2020a_Units.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 17068 features and 21 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -374984.2 ymin: -604454.8 xmax: 540016.3 ymax: 449743.2
## Projected CRS: NAD83 / California Albers
What is the CRS of the CPAD data?
Let’s transform the CRS of the CPAD data to match that used by the alameda data (CA Albers).
st_crs(cpad) == st_crs(alameda)
## [1] TRUE
Let’s plot the data in so that we know what to expect. CPAD is big, so wait for it…
plot(alameda$geometry, col = 'grey', border = "grey")
plot(cpad$geometry, col = 'green', add = T)
We can see from our map that some of the protected areas are completely within Alameda County, some of them overlap, and some are completely outside of the county. To get both of the “inside” and “overlaps” cases we use the st_intersects spatial selection operator, which is the default. Let’s check it out.
cpad_intersects <- cpad[alameda,] # st_intersects
cpad_within <- cpad[alameda, , op = st_within] # st_within
We can use tmap to explore the difference in the results from st_intersects vs st_within
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(alameda) +
tm_polygons(col = "gray",
border.col = "grey") +
tm_shape(cpad_intersects) +
tm_borders(col = "green") +
tm_shape(cpad_within) +
tm_borders(col = 'red')
Use the tmap layer control to toggle the intersects and within results and examine them more closely.
What you can see from the above, is that by default, st_intersects returns the features that intersect but it does not clip the features to the boundary of Alameda County. For that, we would need to use a different spatial operation - st_intersection.
Let’s try it!
cpad_in_ac <- st_intersection(cpad, alameda)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
Great! Now, if we scroll the resulting sf object we’ll see that the COUNTY column of our resulting subset gives us a good sanity check on our results. Or does it?
table(cpad_in_ac$COUNTY)
##
## Alameda Contra Costa San Joaquin Santa Clara
## 645 12 1 3
Always check your output - both the attribute table & the geometry!
head(cpad_in_ac)
## Simple feature collection with 6 features and 45 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -198139.5 ymin: -48130.76 xmax: -162337.3 ymax: -22665.86
## Projected CRS: NAD83 / California Albers
## ACCESS_TYP UNIT_ID UNIT_NAME SUID_NMA AGNCY_ID
## 64 Open Access 185 Augustin Bernal Park 8732 1257
## 146 Open Access 366 San Antonio Park 24832 1228
## 218 Open Access 586 Quarry Lakes Regional Recreation Area 30594 2032
## 394 Open Access 1438 Tennis & Community Park 26243 1257
## 409 Open Access 48353 Sean Diamond Park 32917 1090
## 480 Open Access 1631 Tillman Park 26365 1001
## AGNCY_NAME AGNCY_LEV AGNCY_TYP
## 64 Pleasanton, City of City City Agency
## 146 Oakland, City of City City Agency
## 218 East Bay Regional Park District Special District Recreation/Parks District
## 394 Pleasanton, City of City City Agency
## 409 Dublin, City of City City Agency
## 480 Alameda, City of City City Agency
## AGNCY_WEB LAYER
## 64 http://www.cityofpleasantonca.gov/ City
## 146 http://www2.oaklandnet.com/Government/o/opr/index.htm City
## 218 http://www.ebparks.org/ Special District
## 394 http://www.cityofpleasantonca.gov/ City
## 409 http://www.ci.dublin.ca.us/index.aspx?nid=1458 City
## 480 http://alamedaca.gov/recreation City
## MNG_AG_ID MNG_AGENCY MNG_AG_LEV
## 64 1257 Pleasanton, City of City
## 146 1228 Oakland, City of City
## 218 2032 East Bay Regional Park District Special District
## 394 1257 Pleasanton, City of City
## 409 1090 Dublin, City of City
## 480 1001 Alameda, City of City
## MNG_AG_TYP
## 64 City Agency
## 146 City Agency
## 218 Recreation/Parks District
## 394 City Agency
## 409 City Agency
## 480 City Agency
## PARK_URL
## 64 http://www.cityofpleasantonca.gov/services/recreation/rec-augustinpark.html
## 146 <NA>
## 218 <NA>
## 394 <NA>
## 409 https://www.dublin.ca.gov/Facilities/Facility/Details/Sean-Diamond-Park-33
## 480 <NA>
## COUNTY ACRES LABEL_NAME YR_EST DES_TP
## 64 Alameda 217.388 Augustin Bernal Park 0 Local Park
## 146 Alameda 10.619 San Antonio Park 0 Local Park
## 218 Alameda 254.616 Quarry Lakes Reg. Rec. Area 2001 Local Recreation Area
## 394 Alameda 15.595 Tennis & Community Park 0 Local Park
## 409 Alameda 4.986 Sean Diamond Park 2018 Local Park
## 480 Alameda 3.586 Tillman Park 0 Local Park
## GAP_STS NAME STATE_NAME POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE
## 64 4 Alameda California 1510271 2029.8 1534551 2062.402 649122
## 146 4 Alameda California 1510271 2029.8 1534551 2062.402 649122
## 218 4 Alameda California 1510271 2029.8 1534551 2062.402 649122
## 394 4 Alameda California 1510271 2029.8 1534551 2062.402 649122
## 409 4 Alameda California 1510271 2029.8 1534551 2062.402 649122
## 480 4 Alameda California 1510271 2029.8 1534551 2062.402 649122
## BLACK AMERI_ES ASIAN HAWN_PI HISPANIC OTHER MULT_RACE MALES FEMALES
## 64 190451 9799 394560 12802 339889 162540 90997 740573 769698
## 146 190451 9799 394560 12802 339889 162540 90997 740573 769698
## 218 190451 9799 394560 12802 339889 162540 90997 740573 769698
## 394 190451 9799 394560 12802 339889 162540 90997 740573 769698
## 409 190451 9799 394560 12802 339889 162540 90997 740573 769698
## 480 190451 9799 394560 12802 339889 162540 90997 740573 769698
## MED_AGE AVE_HH_SZ AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC
## 64 36.6 2.7 3.3 582549 37411 291242 253896
## 146 36.6 2.7 3.3 582549 37411 291242 253896
## 218 36.6 2.7 3.3 582549 37411 291242 253896
## 394 36.6 2.7 3.3 582549 37411 291242 253896
## 409 36.6 2.7 3.3 582549 37411 291242 253896
## 480 36.6 2.7 3.3 582549 37411 291242 253896
## CountyFIPS geometry
## 64 06068 POLYGON ((-168734.7 -40681,...
## 146 06068 POLYGON ((-197188.2 -22828....
## 218 06068 MULTIPOLYGON (((-176646.1 -...
## 394 06068 POLYGON ((-167580.2 -36255....
## 409 06068 POLYGON ((-162499.7 -31451....
## 480 06068 POLYGON ((-198139.5 -28042....
Let’s also use an overlay plot to check the output geometry.
tm_shape(alameda) +
tm_polygons(col = 'gray',
border.col = 'gray') +
tm_shape(cpad_in_ac) +
tm_polygons(col = 'ACRES',
palette = 'YlGn',
border.col = 'black',
lwd = 0.4,
alpha = 0.8,
title = 'Protected areas in Alameda County, colored by area')
It really depends! But make sure you understand the difference.
st_intersects is a logical operator that returns true if two geometries intersect in any way. When used to subset (or filter) a spatial dataframe, st_intersects returns those features in a dataframe that intersect with the filter dataframe.
On the other hand, st_intersection returns a new spatial dataframe that is the set intersection of the two dataframes, including both the geometries and attributes of the intersecting features. Use st_intersection with caution and always check your results.
It’s your turn.
Write a spatial relationship query to create a new dataset containing only the BART stations in Berkeley.
We already have these two datasets loaded as bart_stations and berkeley_utm.
Check the CRS of the two layers and plot these two datasets in an overlay map. If you need to transform the CRS of the bart_stations, make them match the CRS of berkeley_utm.
Then, write your own code to: 1. Spatially select the BART stations that are within Berkeley 2. Plot the Berkeley boundary and then overlay the selected BART stations.
# Check CRS of bart_stations
st_crs(bart_stations)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
#transform bart stations to make berkeley_utm10
bart_utm10 = st_transform(bart_stations, st_crs(berkeley_utm10))
Plot the data together
plot(bart_utm10$geometry)
plot(berkeley_utm10$geometry, border = 'blue', add=T)
Now, in the cell below, write the code to spatially select the Berkeley BART stations, then make the map.
# YOUR CODE HERE:
# Spatially select the BART stations within Berkeley
# Plot the Bart stations in Berkeley overlaid on top of the Berkeley City Boundary
Now that we’ve seen the basic idea of spatial measurement and spatial relationship queries, let’s take a look at a common analysis that combines those concepts: promximity analysis.
Proximity analysis seeks to identify near features - or, in other words, all features in a focal feature set that are within some maximum distance of features in a reference feature set.
A very common workflow for this analysis is:
Buffer around the features in the reference dataset to create buffer polygons.
Run a spatial relationship query to find all focal features that intersect (or are within) the buffer polygons.
Let’s read in our bike boulevard data again.
Then we’ll find out which of our Berkeley schools are within a block’s distance (200 meters) of the bike boulevards.
bike_blvds <- st_read(here("notebook_data",
"transportation",
"BerkeleyBikeBlvds.geojson"))
## Reading layer `BerkeleyBikeBlvds' from data source
## `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/transportation/BerkeleyBikeBlvds.geojson'
## using driver `GeoJSON'
## Simple feature collection with 211 features and 10 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: 561541.2 ymin: 4189007 xmax: 566451.6 ymax: 4193483
## Projected CRS: WGS 84 / UTM zone 10N
plot(bike_blvds$geometry)
Of course, we need to reproject the boulevards to our projected CRS.
bike_blvds_utm10 = st_transform(bike_blvds, st_crs(berkeley_utm10))
Now we can create our 200 meter bike boulevard buffers.
bike_blvds_buf = st_buffer(bike_blvds_utm10, dist = 200)
Now let’s overlay everything.
tm_shape(berkeley_utm10) +
tm_polygons(col = 'lightgrey') +
tm_shape(bike_blvds_buf) +
tm_polygons(col = 'pink', alpha = 0.5) +
tm_shape(bike_blvds_utm10) +
tm_lines() +
tm_shape(berkeley_schools) +
tm_dots(col = 'purple', size = 0.2)
Great! Looks like we’re all ready to run our spatial relationship query to complete the proximity analysis. At this point (pun intended) select the schools that are in within the bike boulevard buffer polygons.
schools_near_blvds = berkeley_schools[bike_blvds_buf, ]
# or
#schools_near_blvds = berkeley_schools[bike_blvds_buf, , op = 'st_within']
Now let’s overlay again, to see if the schools we selected make sense.
tm_shape(berkeley_utm10) +
tm_borders() +
# add the bike blvd buffer polygons
tm_shape(bike_blvds_buf) +
tm_polygons(col = 'pink', alpha = 0.5) +
# add the bike blvd line features
tm_shape(bike_blvds_utm10) +
tm_lines(lwd=2) +
# add all berkeley schools
tm_shape(berkeley_schools) +
tm_dots(col = 'purple', size = 0.2) +
# add schools near bike boulevards in yellow
tm_shape(schools_near_blvds) +
tm_dots(col = 'yellow', size = 0.2)
You can use st_distance and its companion function st_nearest_feature to compute the distance between each feature and the nearest bike boulevard. The st_nearest_feature function returns the ID of the closest feature.
# Identify the nearest bike boulevard for each school
nearest_blvd = st_nearest_feature(berkeley_schools, bike_blvds_utm10)
# take a look!
nearest_blvd
## [1] 71 171 39 136 42 30 163 172 127 171 189 156 137 33 187 197 50 207 169
## [20] 102 125 137 3 109 207 76 135 173 56 102 146 76
To repeat, the st_nearest_feature function returns the ID of the closest feature. These are stored in nearest_blvd.
Then we can calculate the distance between each school and it’s nearest bike boulevard.
berkeley_schools$bike_blvd_dist <- st_distance(berkeley_schools,
bike_blvds_utm10[nearest_blvd,],
by_element = TRUE)
#take alook
berkeley_schools[order(berkeley_schools$bike_blvd_dist),]
## Simple feature collection with 32 features and 8 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 561890.9 ymin: 4189109 xmax: 566342 ymax: 4194327
## Projected CRS: NAD83 / UTM zone 10N
## First 10 features:
## Site Address City
## 407 Berkeley Adult 1701 San Pablo Ave. Berkeley
## 27 Berkeley Arts Magnet at Whittier 2015 Virginia St Berkeley
## 437 Ecole Bilingue De Berkeley 1009 Heinz Avenue Berkeley
## 32 Leconte Elementary 2241 Russell Street Berkeley
## 442 School of the Madeleine 1225 Milvia Street Berkeley
## 33 Malcolm X Elementary 1731 Prince St Berkeley
## 444 Walden Center and School 2446 McKinley Avenue Berkeley
## 445 West-Wind Academy 1551 University Avenue Berkeley
## 432 Academy, The 2722 Benvenue Avenue Berkeley
## 35 Rosa Parks Environmental Science Magnet 920 Allston Way Berkeley
## State Type API Org geometry bike_blvd_dist
## 407 CA Adult 0 Public POINT (562153.5 4192013) 1.288383 [m]
## 27 CA ES 829 Public POINT (564122 4192361) 13.848161 [m]
## 437 CA K-8 0 private POINT (562512.4 4189869) 15.162313 [m]
## 32 CA ES 748 Public POINT (564890.8 4190253) 15.332395 [m]
## 442 CA K-8 0 private POINT (564013.4 4193297) 15.892361 [m]
## 33 CA ES 834 Public POINT (563890.6 4189674) 27.250406 [m]
## 444 CA K-6 0 private POINT (563898.5 4190992) 92.556921 [m]
## 445 CA 1 to 12 0 private POINT (563251.3 4191717) 93.426741 [m]
## 432 CA K-8 0 private POINT (565543.3 4190686) 94.064527 [m]
## 35 CA ES 735 Public POINT (562038.4 4191134) 107.902846 [m]
Now it’s your turn to try out a proximity analysis!
Write your own code to find all BERKELEY schools within walking distance (1 km) of a BART station.
As a reminder, let’s break this into steps:
bart_utm10 dataframeTo see the solution, look at the hidden text below.
# YOUR CODE HERE:
# spatially select the Berkeley BART stations
# # (You may have done this in a previous exercise.)
# buffer the BART stations to 1 km
# spatially select the schools within the buffers
# map your results with tmap
# # plot the Berkeley boundary (for reference) in lightgrey
# add the BART stations (for reference) to the plot in green
# add the BART buffers (for check) in lightgreen
# add all Berkeley schools (for reference) in black
# add the schools near BART (for check) in yellow
Compute the distance between each Berkeley school and its nearest BART station!
# YOUR CODE HERE:
Leveraging what we’ve learned in our earlier lessons, we got to work with map overlays and start answering questions related to proximity. Key concepts include:
st_area,st_lengthst_distancest_intersects,st_intersectionst_within, etc.st_buffer